Personnel
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Pedestrian Detection: Training Set Optimization

Participants : Remi Trichet, Javier Ortiz.

keywords: computer vision, pedestrian detection, classifier training, data selection, data generation, data weighting, feature extraction

Figure 5. Training pipeline. The initial training set generation selects data while balancing negative and positive sample cardinalities. A cascade of classifiers is then trained on it, each independent classifier being learnt through bootstrapping. balanced positive and negative sets is sought all along the cascade. Each circle surface is proportional to the set's cardinality that it represents.
IMG/training_pipeline.png
Figure 6. LBP Channel features pipeline.
IMG/Haar-LBP_pipeline.png

This year's work builds on the near real-time pedestrian detector introduced last year. Let's recall that this detector novelty mainly focusses on our training set generation protocol, named FairTrain [39]. The methodology, illustrated in figure 5, decomposes in two distinct parts: The initial training set generation and the classifier training. The initial training set generation carefully selects data from a set of images while balancing negative and positive sample cardinalities. We then train a cascade of 1 to n classifiers. This cascade could consist of a cascade-of-rejectors [57], [143], [48], [118], [122], a soft cascade [99], or both. In addition, each independent classifier is learnt through bootstrapping [59], [69] to improve performance. One key aspect is to seek balanced positive and negative sets at all time. Hence, all along the cascade, the minority class is oversampled to create balanced positive and negative sets. See [39] for details.

This year's improvement on this framework is two-fold: refined experimentation and Local Binary Pattern (LBP) channel descriptor.

In many aspects, the construction of a training set remains similar to what it was at the birth of the domain, some related problems are not well studied, and sometimes still tackled empirically. This work studies the pedestrian classifier training conditions. More than a survey of existing training techniques, our experimentation highlights impactful parameters, potential new research directions, and combination dilemmas. They allowed us to better understand and parametrized our pipeline. Second, we introduce a 12-valued filter representation based on LBP. Indeed, various improvements now allow for this texture feature to provide a very discriminative, yet compact descriptor. This new LBP-based channel descriptor outperforms channel features [65] while requiring a fraction of the original LBP memory footprint. Uniform patterns [100] and Haar-based LBP [56] are employed to shrink the filter dimension in accordance to our needs. Also, cell stacking and new filter combination restriction based on proposal window coverage are successfully applied. Finally, a more reliable feature selection technique is introduced to construct a lower dimension final descriptor without harming its discriminability. Experiments on the Inria and Caltech-USA datasets, respectively presented in tables 1 and 2 validate these progresses.

In the light of these results, combining the FairTrain data selection pipeline with CNN features appears like the obvious next step.